Measuring stability of feature ranking techniques: a noise-based approach
نویسندگان
چکیده
One very common criterion used to evaluate feature selection methods is the performance of a chosen classifier trained with the selected features. Another important evaluation criterion that has, until recently, been neglected is the stability of these feature selection methods. While other studies have shown interest in measuring the degree of agreement between the outputs of a technique trained on randomly selected subsets from the same input data, this study presents the importance of evaluating stability in the presence of noise. Experiments are conducted with 17 filters (six standard filter-based ranking techniques and 11 threshold-based feature selection techniques) on nine different real-world datasets. This paper identifies the techniques that are inherently more sensitive to class noise and demonstrates how certain characteristics (sample size and class imbalance) of the data can affect the stability performance of some feature selection methods.
منابع مشابه
Robustness of Threshold-Based Feature Rankers with Data Sampling on Noisy and Imbalanced Data
Gene selection has become a vital component in the learning process when using high-dimensional gene expression data. Although extensive research has been done towards evaluating the performance of classifiers trained with the selected features, the stability of feature ranking techniques has received relatively little study. This work evaluates the robustness of eleven threshold-based feature ...
متن کاملBridging the semantic gap for software effort estimation by hierarchical feature selection techniques
Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...
متن کاملDiscrimination of Power Quality Distorted Signals Based on Time-frequency Analysis and Probabilistic Neural Network
Recognition and classification of Power Quality Distorted Signals (PQDSs) in power systems is an essential duty. One of the noteworthy issues in Power Quality Analysis (PQA) is identification of distorted signals using an efficient scheme. This paper recommends a Time–Frequency Analysis (TFA), for extracting features, so-called "hybrid approach", using incorporation of Multi Resolution Analysis...
متن کاملSingle Feature Ranking and Binary Particle Swarm Optimisation Based Feature Subset Ranking for Feature Selection
This paper proposes two wrapper based feature selection approaches, which are single feature ranking and binary particle swarm optimisation (BPSO) based feature subset ranking. In the first approach, individual features are ranked according to the classification accuracy so that feature selection can be accomplished by using only a few top-ranked features for classification. In the second appro...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJBIDM
دوره 7 شماره
صفحات -
تاریخ انتشار 2012